DiscoverChain of ThoughtPractical Lessons for GenAI Evals | Chip Huyen & Vivienne Zhang
Practical Lessons for GenAI Evals | Chip Huyen & Vivienne Zhang

Practical Lessons for GenAI Evals | Chip Huyen & Vivienne Zhang

Update: 2024-12-04
Share

Description

As AI agents and multimodal models become more prevalent, understanding how to evaluate GenAI is no longer optional – it's essential. 


Generative AI introduces new complexities in assessment compared to traditional software, and this week on Chain of Thought we’re joined by Chip Huyen (Storyteller, Tép Studio), Vivienne Zhang (Senior Product Manager, Generative AI Software, Nvidia) for a discussion on AI evaluation best practices. 


Before we hear from our guests, Vikram Chatterji (CEO, Galileo) and Conor Bronsdon (Developer Awareness, Galileo) give their takes on the complexities of AI evals and how to overcome them through the use of objective criteria in evaluating open-ended tasks, the role of hallucinations in AI models, and the importance of human-in-the-loop systems.


Afterwards, Chip and Vivienne sit down with Atin Sanyal (Co-Founder & CTO, Galileo) to explore common evaluation approaches, best practices for building frameworks, and implementation lessons. They also discuss the nuances of evaluating AI coding assistants and agentic systems.



Chapters:
00:00 Challenges in Evaluating Generative AI


05:45 Evaluating AI Agents


13:08 Are Hallucinations Bad?


17:12 Human in the Loop Systems


20:49 Panel discussion begins


22:57 Challenges in Evaluating Intelligent Systems


24:37 User Feedback and Iterative Improvement


26:47 Post-Deployment Evaluations and Common Mistakes


28:52 Hallucinations in AI: Definitions and Challenges


34:17 Evaluating AI Coding Assistants


38:15 Agentic Systems: Use Cases and Evaluations


43:00 Trends in AI Models and Hardware


45:42 Future of AI in Enterprises


47:16 Conclusion and Final Thoughts


Follow:
Vikram Chatterji: https://www.linkedin.com/in/vikram-chatterji/


Atin Sanyal: ⁠⁠https://www.linkedin.com/in/atinsanyal/


Conor Bronsdon: https://www.linkedin.com/in/conorbronsdon/
Chip Huyen: ⁠https://www.linkedin.com/in/chiphuyen/⁠
Vivienne Zhang: ⁠⁠https://www.linkedin.com/in/viviennejiaozhang/




Show notes:
Watch all of Productionize 2.0: ⁠https://www.galileo.ai/genai-productionize-2-0⁠

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Practical Lessons for GenAI Evals | Chip Huyen & Vivienne Zhang

Practical Lessons for GenAI Evals | Chip Huyen & Vivienne Zhang

Galileo